Evaluation of Terminology Extractors: Principles and Experiments

نویسنده

  • D. Bourigault
چکیده

We claim that the evaluation of terminology extraction systems cannot rely entirely on the comparison of their results with a reference list, as the very existence of such a list made up independently of any application is highly questionable and as these systems do not yield only actual terms but potential (or candidate) terms. We propose instead to align their results on various corpora, so as to isolate the divergences and the convergences and to understand their rationale. This paper compares two noun phrase (NP) extraction softwares. It proved necessary to translate the NP parses produced by these softwares into a pivot grammar. We developed a Frontier to Root Transducer based on syntax-directed translation schemas used for standardization and optimization in the compiling process. These tried and tested compiling techniques provide an efficient and nevertheless declarative way of obtaining mappings between very different parses ("skeleton parses" / detailled constituent structures) and to evaluate the underlying

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ARC A3 Project: Terminology Acquisition Tools: Evaluation Method And Task

This paper describes the work achieved in the Concerted Research Project ARC A3 supported and coordinated by the AUF, former Aupelf-Uref. The project deals with the evaluation of term and semantic relation extraction from corpora in French. Eight participants, both from public institutions and industrial corporations were involved in this project and were responsible for producing corpora suita...

متن کامل

Defining a Gold Standard for the Evaluation of Term Extractors

We describe a methodology for constructing a gold standard for the automatic evaluation of term extractors, an important step toward establishing a much-needed evaluation protocol for term extraction systems. The gold standard proposed is a fully annotated corpus, constructed in accordance with a specific terminological setting (i.e. the compilation of a specialized dictionary of automotive mec...

متن کامل

The Quæro Evaluation Initiative on Term Extraction

The Quæro program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of...

متن کامل

Comparative Evaluation of C-value in the Treatment of Nested Terms

In statistical term extraction systems the identification and selection of nested term candidates often presents a challenge. The paper presents an implementation and evaluation of C-value, a heuristic that ranks and/or discards nested terms according to their stability in the corpus. The method was tested for English and Slovene, for both the overall performance of the term extractor improved ...

متن کامل

NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data

In this paper, we present NERD, an evaluation framework we have developed that records and analyzes ratings of Named Entity (NE) extraction and disambiguation tools working on English plain text articles performed by human beings. NERD enables the comparison of different popular Linked Data entity extractors which expose APIs such as AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zema...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011